The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We propose a framework for constructing goodness‐of‐fit tests in both low and high dimensional linear models. We advocate applying regression methods to the scaled residuals following either an ordinary least squares or lasso fit to the data, and using some proxy for prediction error as the final test statistic. We call this family residual prediction tests. We show that simulation can be used to...
Functional data analysis is now a well‐established discipline of statistics, with its core concepts and perspectives in place. Despite this, there are still fundamental statistical questions which have received relatively little attention. One of these is the systematic construction of confidence regions for functional parameters. This work is concerned with developing, understanding and visualizing...
Expectation propagation (EP) is a widely successful algorithm for variational inference. EP is an iterative algorithm used to approximate complicated distributions, typically to find a Gaussian approximation of posterior distributions. In many applications of this type, EP performs extremely well. Surprisingly, despite its widespread use, there are very few theoretical guarantees on Gaussian EP, and...
The Wasserstein distance is an attractive tool for data analysis but statistical inference is hindered by the lack of distributional limits. To overcome this obstacle, for probability measures supported on finitely many points, we derive the asymptotic distribution of empirical Wasserstein distances as the optimal value of a linear programme with random objective function. This facilitates statistical...
We develop a unified theory of designs for controlled experiments that balance baseline covariates a priori (before treatment and before randomization) using the framework of minimax variance and a new method called kernel allocation. We show that any notion of a priori balance must go hand in hand with a notion of structure, since with no structure on the dependence of outcomes on baseline covariates...
We investigate the problem of testing whether d possibly multivariate random variables, which may or may not be continuous, are jointly (or mutually) independent. Our method builds on ideas of the two‐variable Hilbert–Schmidt independence criterion but allows for an arbitrary number of variables. We embed the joint distribution and the product of the marginals in a reproducing kernel Hilbert space...
In most real life studies, auxiliary variables are available and are employed to explain and understand missing data patterns and to evaluate and control causal relationships with variables of interest. Usually their availability is assumed to be a fact, even if the variables are measured without the objectives of the study in mind. As a result, inference with missing data and causal inference require...
Change points are a very common feature of ‘big data’ that arrive in the form of a data stream. We study high dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the co‐ordinates. The challenge is to borrow strength across the co‐ordinates to detect smaller changes than could be observed in any individual component series. We propose a two‐stage...
We present a novel inference methodology to perform Bayesian inference for spatiotemporal Cox processes where the intensity function depends on a multivariate Gaussian process. Dynamic Gaussian processes are introduced to enable evolution of the intensity function over discrete time. The novelty of the method lies on the fact that no discretization error is involved despite the non‐tractability of...
Distance‐weighted discrimination (DWD) is a modern margin‐based classifier with an interesting geometric motivation. It was proposed as a competitor to the support vector machine (SVM). Despite many recent references on DWD, DWD is far less popular than the SVM, mainly because of computational and theoretical reasons. We greatly advance the current DWD methodology and its learning theory. We propose...
Significance analysis of microarrays (SAM) is a highly popular permutation‐based multiple‐testing method that estimates the false discovery proportion (FDP): the fraction of false positive results among all rejected hypotheses. Perhaps surprisingly, until now this method had no known properties. This paper extends SAM by providing 1−α upper confidence bounds for the FDP, so that exact confidence statements...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.